Marketing Channel ROI Optimization & Budget Allocation¶

Dataset: data/marketing_data.csv (Kaggle: Predict Conversion in Digital Marketing)

This notebook:

  1. Cleans and explores real user-level marketing data.
  2. Runs an A/B test for CampaignType (Awareness vs Conversion) using a two-proportion Z-test.
  3. Mitigates selection bias via Propensity Score Matching (PSM) and re-tests significance.
  4. Computes average ROI and CAC by channel (AOV assumed at $100).
  5. Reconstructs campaign-level data to fit Marginal ROI (M-ROI) quadratic models.
  6. Compares Average ROI vs M-ROI and gives budget recommendations.

Important: This is an observational dataset. Any causal interpretation requires controls (PSM) and assumptions.

In [2]:
# Step 1: Data Import & Cleaning
import pandas as pd
import numpy as np
import plotly.express as px
from statsmodels.stats.proportion import proportions_ztest
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import pairwise_distances
import warnings
warnings.filterwarnings('ignore')

AOV = 100  # Assumed Average Order Value ($)
np.random.seed(42)

# Install dependencies as needed:
%pip install kagglehub[pandas-datasets]
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Set the path to the file you'd like to load
file_path = "digital_marketing_campaign_dataset.csv"

# Load the latest version
df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "rabieelkharoua/predict-conversion-in-digital-marketing-dataset",
  file_path,
)

print("First 5 records:", df.head())
zsh:1: no matches found: kagglehub[pandas-datasets]
Note: you may need to restart the kernel to use updated packages.
First 5 records:    CustomerID  Age  Gender  Income CampaignChannel CampaignType      AdSpend  \
0        8000   56  Female  136912    Social Media    Awareness  6497.870068   
1        8001   69    Male   41760           Email    Retention  3898.668606   
2        8002   46  Female   88456             PPC    Awareness  1546.429596   
3        8003   32  Female   44085             PPC   Conversion   539.525936   
4        8004   60  Female   83964             PPC   Conversion  1678.043573   

   ClickThroughRate  ConversionRate  WebsiteVisits  PagesPerVisit  TimeOnSite  \
0          0.043919        0.088031              0       2.399017    7.396803   
1          0.155725        0.182725             42       2.917138    5.352549   
2          0.277490        0.076423              2       8.223619   13.794901   
3          0.137611        0.088004             47       4.540939   14.688363   
4          0.252851        0.109940              0       2.046847   13.993370   

   SocialShares  EmailOpens  EmailClicks  PreviousPurchases  LoyaltyPoints  \
0            19           6            9                  4            688   
1             5           2            7                  2           3459   
2             0          11            2                  8           2337   
3            89           2            2                  0           2463   
4             6           6            6                  8           4345   

  AdvertisingPlatform AdvertisingTool  Conversion  
0            IsConfid      ToolConfid           1  
1            IsConfid      ToolConfid           1  
2            IsConfid      ToolConfid           1  
3            IsConfid      ToolConfid           1  
4            IsConfid      ToolConfid           1  
In [3]:
# Basic sanity checks
print('\nMissing values by column:')
print(df.isna().sum().sort_values(ascending=False).head(20))

print('\nDuplicate CustomerID count:', df.duplicated('CustomerID').sum())
print('Unique customers:', df['CustomerID'].nunique())

# Minimal cleaning: drop rows missing critical fields for core analyses
critical_cols = ['CustomerID','CampaignChannel','CampaignType','AdSpend','Conversion','Age','Income','WebsiteVisits']
df = df.dropna(subset=critical_cols).copy()

# Enforce types
df['Conversion'] = df['Conversion'].astype(int)
for col in ['AdSpend','Age','Income','WebsiteVisits']:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df = df.dropna(subset=['AdSpend','Age','Income','WebsiteVisits'])
df = df[df['AdSpend'] >= 0]

print('Post-clean shape:', df.shape)
Missing values by column:
CustomerID             0
Age                    0
AdvertisingTool        0
AdvertisingPlatform    0
LoyaltyPoints          0
PreviousPurchases      0
EmailClicks            0
EmailOpens             0
SocialShares           0
TimeOnSite             0
PagesPerVisit          0
WebsiteVisits          0
ConversionRate         0
ClickThroughRate       0
AdSpend                0
CampaignType           0
CampaignChannel        0
Income                 0
Gender                 0
Conversion             0
dtype: int64

Duplicate CustomerID count: 0
Unique customers: 8000
Post-clean shape: (8000, 20)

Step 2: Exploratory Data Analysis (EDA)¶

  • Overall conversion rate and user profiles.
  • Channel-level spend and conversions.
  • Cross-profiles by channel (Age, Income).
In [5]:
overall_conv_rate = df['Conversion'].mean()
print(f'Overall conversion rate: {overall_conv_rate:.3f}')

# Age distribution
fig_age = px.histogram(df, x='Age', nbins=30, title='Age Distribution')
fig_age.show()
Overall conversion rate: 0.876
In [6]:
# Income distribution
fig_inc = px.histogram(df, x='Income', nbins=30, title='Income Distribution')
fig_inc.show()
In [7]:
# Channel-level AdSpend and Conversion
channel_agg = df.groupby('CampaignChannel').agg(
    TotalAdSpend=('AdSpend','sum'),
    Conversions=('Conversion','sum'),
    Users=('CustomerID','nunique')
).reset_index()
channel_agg['ConvRate'] = channel_agg['Conversions'] / channel_agg['Users']
fig_spend = px.bar(channel_agg, x='CampaignChannel', y='TotalAdSpend', title='Total AdSpend by Channel')
fig_spend.show()
fig_conv = px.bar(channel_agg, x='CampaignChannel', y='Conversions', title='Total Conversions by Channel')
fig_conv.show()
In [8]:
# Cross-profile by channel
fig_age_ch = px.box(df, x='CampaignChannel', y='Age', title='Age by CampaignChannel')
fig_age_ch.show()
fig_inc_ch = px.box(df, x='CampaignChannel', y='Income', title='Income by CampaignChannel')
fig_inc_ch.show()

Step 3: A/B Testing (Quasi-Experimental)¶

Key Point: This is an observational dataset. Users may self-select or be targeted differently, causing selection bias.

We compare Conversion rates between Awareness (A) and Conversion (B). Then we apply Propensity Score Matching (PSM) using covariates (Age, Gender, Income, WebsiteVisits) and repeat the Z-test on the matched sample.

In [10]:
# Split groups
A_label, B_label = 'Awareness', 'Conversion'
df_A = df[df['CampaignType'] == A_label].copy()
df_B = df[df['CampaignType'] == B_label].copy()
print('A size:', len(df_A), 'B size:', len(df_B))

# Two-proportion Z-test on raw groups
conv_A_sum = df_A['Conversion'].sum()
conv_B_sum = df_B['Conversion'].sum()
n_A = len(df_A)
n_B = len(df_B)
z_raw, p_raw = proportions_ztest([conv_B_sum, conv_A_sum], [n_B, n_A], alternative='larger')
print(f'Raw Z={z_raw:.3f}, p-value={p_raw:.4f}')
print(f'ConvRate A={conv_A_sum/n_A:.3f}, B={conv_B_sum/n_B:.3f}')
A size: 1988 B size: 2077
Raw Z=8.117, p-value=0.0000
ConvRate A=0.856, B=0.934

Propensity Score Matching (PSM)¶

We estimate the probability of being in B (CampaignType == Conversion) with a logistic regression on covariates, then perform nearest-neighbor matching on the propensity score with a caliper.

In [12]:
# Prepare data for PSM
df_psm = df[df['CampaignType'].isin([A_label, B_label])].copy()
df_psm['T'] = (df_psm['CampaignType'] == B_label).astype(int)

# Encode Gender simple binary
df_psm['Gender_bin'] = df_psm['Gender'].map({'Male':1,'Female':0}).fillna(0)

covars = ['Age','Income','WebsiteVisits','Gender_bin']
X = df_psm[covars].copy()
y = df_psm['T']

preprocess = ColumnTransformer([
    ('num', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['Age','Income','WebsiteVisits']),
    ('bin', 'passthrough', ['Gender_bin'])
])

logit = LogisticRegression(max_iter=1000)
psm_pipe = Pipeline([('prep', preprocess), ('logit', logit)])
psm_pipe.fit(X, y)
ps = psm_pipe.predict_proba(X)[:,1]
df_psm['pscore'] = ps

# Nearest-neighbor matching on pscore (1:1), simple caliper
treated = df_psm[df_psm['T']==1].copy()
control = df_psm[df_psm['T']==0].copy()
caliper = 0.05

nbrs = NearestNeighbors(n_neighbors=1).fit(control[['pscore']])
dist, idx = nbrs.kneighbors(treated[['pscore']])
match_pairs = []
used_controls = set()
for i, (d, j) in enumerate(zip(dist.flatten(), idx.flatten())):
    if d <= caliper:
        ctrl_index = control.iloc[j].name
        if ctrl_index not in used_controls:
            match_pairs.append((treated.iloc[i].name, ctrl_index))
            used_controls.add(ctrl_index)

matched_t_idx = [t for t, _ in match_pairs]
matched_c_idx = [c for _, c in match_pairs]

df_matched = pd.concat([
    df_psm.loc[matched_t_idx],
    df_psm.loc[matched_c_idx]
])
print('Matched sample size:', df_matched.shape[0])

# Re-run Z-test after PSM
mA = df_matched[df_matched['T']==0]
mB = df_matched[df_matched['T']==1]
z_psm, p_psm = proportions_ztest([mB['Conversion'].sum(), mA['Conversion'].sum()], [len(mB), len(mA)], alternative='larger')
print(f'PSM Z={z_psm:.3f}, p-value={p_psm:.4f}')
print(f'PSM ConvRate A={mA["Conversion"].mean():.3f}, B={mB["Conversion"].mean():.3f}')
Matched sample size: 2240
PSM Z=7.129, p-value=0.0000
PSM ConvRate A=0.850, B=0.942

Step 4: ROI & CAC Analysis (Average Returns)¶

Assumption: Average Order Value (AOV) = $100.

Formulas:

  • TotalRevenue = TotalConversions * 100
  • ROI = (TotalRevenue - TotalAdSpend) / TotalAdSpend
  • CAC = TotalAdSpend / TotalConversions
In [14]:
roi_by_channel = df.groupby('CampaignChannel').agg(
    TotalAdSpend=('AdSpend','sum'),
    Conversions=('Conversion','sum'),
    Users=('CustomerID','nunique')
).reset_index()
roi_by_channel['TotalRevenue'] = roi_by_channel['Conversions'] * AOV
roi_by_channel['ROI'] = (roi_by_channel['TotalRevenue'] - roi_by_channel['TotalAdSpend']) / roi_by_channel['TotalAdSpend']
roi_by_channel['CAC'] = roi_by_channel['TotalAdSpend'] / roi_by_channel['Conversions'].replace(0, np.nan)
roi_by_channel
Out[14]:
CampaignChannel TotalAdSpend Conversions Users TotalRevenue ROI CAC
0 Email 7.871576e+06 1355 1557 135500 -0.982786 5809.281071
1 PPC 8.199237e+06 1461 1655 146100 -0.982181 5612.071856
2 Referral 8.653519e+06 1518 1719 151800 -0.982458 5700.605195
3 SEO 7.740904e+06 1359 1550 135900 -0.982444 5696.029342
4 Social Media 7.542323e+06 1319 1519 131900 -0.982512 5718.213229
In [15]:
fig_roi = px.bar(roi_by_channel, x='CampaignChannel', y='ROI', title='Average ROI by Channel')
fig_roi.show()
fig_cac = px.bar(roi_by_channel, x='CampaignChannel', y='CAC', title='CAC by Channel')
fig_cac.show()

Step 5: Marginal ROI (M-ROI) Modeling¶

Reconstruct campaign-level data using user-level rows. Note that AdSpend, ClickThroughRate, and ConversionRate appear repeated at the user-level—they are campaign attributes. We group by campaign keys to build campaign-level observations.

In [17]:
# Define campaign keys (proxy for unique campaigns)
campaign_keys = ['CampaignChannel','CampaignType','AdSpend']
df_campaign_agg = df.groupby(campaign_keys).agg(
    TotalConversions=('Conversion','sum'),
    TotalUsers=('CustomerID','nunique')
).reset_index()
df_campaign_agg['TotalRevenue'] = df_campaign_agg['TotalConversions'] * AOV
df_campaign_agg.head()
Out[17]:
CampaignChannel CampaignType AdSpend TotalConversions TotalUsers TotalRevenue
0 Email Awareness 154.086910 1 1 100
1 Email Awareness 158.583632 0 1 0
2 Email Awareness 183.243466 1 1 100
3 Email Awareness 230.987601 1 1 100
4 Email Awareness 242.547848 1 1 100

Fit Quadratic Revenue ~ AdSpend per Channel¶

We expect a negative quadratic term (β₂ < 0) if a channel experiences diminishing returns at higher spend levels.

In [19]:
def fit_quadratic(sub):
    X = pd.DataFrame({'const':1, 'AdSpend':sub['AdSpend'], 'AdSpend2':sub['AdSpend']**2})
    y = sub['TotalRevenue']
    model = sm.OLS(y, X).fit()
    return model

models = {}
for ch in df_campaign_agg['CampaignChannel'].unique():
    sub = df_campaign_agg[df_campaign_agg['CampaignChannel']==ch]
    if sub['AdSpend'].nunique() >= 3:  # need variance for quadratic fit
        models[ch] = fit_quadratic(sub)
        print(f'Channel={ch}\n', models[ch].summary().tables[1])
    else:
        print(f'Channel={ch}: not enough distinct AdSpend points for quadratic fit.')
Channel=Email
 ==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         79.3828      2.776     28.593      0.000      73.937      84.828
AdSpend        0.0019      0.001      1.521      0.128      -0.001       0.004
AdSpend2   -5.692e-08   1.18e-07     -0.483      0.629   -2.88e-07    1.74e-07
==============================================================================
Channel=PPC
 ==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         80.1566      2.392     33.514      0.000      75.465      84.848
AdSpend        0.0021      0.001      1.825      0.068      -0.000       0.004
AdSpend2   -6.214e-08   1.09e-07     -0.568      0.570   -2.77e-07    1.52e-07
==============================================================================
Channel=Referral
 ==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         80.8806      2.435     33.219      0.000      76.105      85.656
AdSpend        0.0019      0.001      1.753      0.080      -0.000       0.004
AdSpend2   -6.944e-08   1.06e-07     -0.657      0.511   -2.77e-07    1.38e-07
==============================================================================
Channel=SEO
 ==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         81.2586      2.532     32.091      0.000      76.292      86.225
AdSpend        0.0007      0.001      0.555      0.579      -0.002       0.003
AdSpend2    9.474e-08   1.15e-07      0.827      0.408    -1.3e-07    3.19e-07
==============================================================================
Channel=Social Media
 ==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         78.5166      2.703     29.046      0.000      73.214      83.819
AdSpend        0.0017      0.001      1.336      0.182      -0.001       0.004
AdSpend2    1.948e-09   1.21e-07      0.016      0.987   -2.34e-07    2.38e-07
==============================================================================
In [20]:
df_roi = df_campaign_agg.copy()
df_roi = df_roi[df_roi["AdSpend"] > 0]
df_roi["ROI"] = (df_roi["TotalRevenue"] - df_roi["AdSpend"]) / df_roi["AdSpend"]

for ch, sub in df_roi.groupby("CampaignChannel"):
    fig_roi_scatter = px.scatter(
        sub,
        x="AdSpend",
        y="ROI",
        trendline="lowess",
        opacity=0.5,
        color_discrete_sequence=["#1f77b4"],
        title=f"{ch}: ROI vs AdSpend (LOWESS trend)",
    )
    fig_roi_scatter.update_traces(marker=dict(size=5))
    
    ymin = min(-1.05, sub["ROI"].min() - 0.05)
    ymax = max(0.10,  sub["ROI"].max() + 0.05)
    fig_roi_scatter.update_layout(
        template="plotly_white",
        yaxis=dict(title="ROI", zeroline=True, zerolinewidth=1, zerolinecolor="gray", range=[ymin, ymax]),
        xaxis_title="AdSpend ($)",
        title_font=dict(size=18, family="Arial"),
        font=dict(size=12),
        width=850,
        height=400,
    )
    fig_roi_scatter.add_hline(y=0,  line_dash="dot", line_color="green", annotation_text="Break-even (ROI=0)")
    fig_roi_scatter.add_hline(y=-1, line_dash="dot", line_color="red",   annotation_text="Loss limit (ROI=-1)")
    fig_roi_scatter.show()

Compute Marginal ROI (M-ROI)¶

Given the quadratic model:
$$ \text{Revenue} = \beta_0 + \beta_1 \cdot S + \beta_2 \cdot S^2 $$ the marginal ROI at spend level (S) is: $$ M\text{-}ROI(S) = \beta_1 + 2\beta_2 S. $$ We will compute M-ROI at the current average spend by channel, and compare to average ROI.

In [22]:
# Average ROI table (from Step 4) is roi_by_channel
avg_spend_by_channel = df_campaign_agg.groupby('CampaignChannel')['AdSpend'].mean().rename('AvgSpend').reset_index()

rows = []
for ch in roi_by_channel['CampaignChannel']:
    avg_roi = roi_by_channel.loc[roi_by_channel['CampaignChannel']==ch, 'ROI'].values[0]
    avg_spend = avg_spend_by_channel.loc[avg_spend_by_channel['CampaignChannel']==ch, 'AvgSpend'].values
    avg_spend = float(avg_spend[0]) if len(avg_spend)>0 else np.nan
    if ch in models:
        b0 = models[ch].params['const']
        b1 = models[ch].params['AdSpend']
        b2 = models[ch].params['AdSpend2']
        mroi = b1 + 2*b2*avg_spend
    else:
        mroi = np.nan
    rows.append({'CampaignChannel': ch, 'AverageROI': avg_roi, 'AvgSpend': avg_spend, 'MROI_at_AvgSpend': mroi})

compare_df = pd.DataFrame(rows)
compare_df
Out[22]:
CampaignChannel AverageROI AvgSpend MROI_at_AvgSpend
0 Email -0.982786 5055.604272 0.001311
1 PPC -0.982181 4954.221741 0.001435
2 Referral -0.982458 5034.042284 0.001237
3 SEO -0.982444 4994.131533 0.001602
4 Social Media -0.982512 4965.321428 0.001682
In [24]:
cd = compare_df.copy()

x_pad = 0.001
x_min, x_max = cd["AverageROI"].min() - x_pad, cd["AverageROI"].max() + x_pad
y_vals = cd["MROI_at_AvgSpend"].values
y_span = y_vals.max() - y_vals.min()
pad = max(0.0006, y_span * 1.5)
y_min, y_max = y_vals.min() - pad, y_vals.max() + pad


fig = px.scatter(
    cd,
    x="AverageROI", y="MROI_at_AvgSpend",
    color="CampaignChannel", symbol="CampaignChannel",
    size="AvgSpend", size_max=22,
    text="CampaignChannel",
    hover_data={"AverageROI":":.6f", "MROI_at_AvgSpend":":.6f", "AvgSpend":":.0f"},
    title="Average ROI vs M-ROI (zoomed)"
)
fig.update_traces(textposition="top center")

fig.update_xaxes(title="Average ROI", range=[x_min, x_max], tickformat=".5f", zeroline=True)
fig.update_yaxes(title="M-ROI @ avg spend", range=[y_min, y_max], tickformat=".4f", zeroline=True)
fig.add_hline(y=1, line_dash="dot", line_color="gray")
fig.add_vline(x=1, line_dash="dot", line_color="gray")
fig.update_layout(template="plotly_white", height=520)
fig.show()

Step 6: Business Recommendations & Budget Optimization¶

Goal. Turn statistical findings into an actionable budget plan using both Average ROI and Marginal ROI (M-ROI).

6.1 Experiment decision (A/B test → rollout?)¶

  • We tested CampaignType = Conversion vs Awareness using a two-proportion Z-test.
  • Because this is observational data, we repeated the test after Propensity Score Matching (PSM) on (Age, Gender, Income, WebsiteVisits).

Decision rule.

  • If p_raw < 0.05 and p_psm < 0.05 (uplift stable post-PSM): Roll out Conversion more broadly.
  • If only raw is significant but PSM is not: Maintain A/B holdout and collect more data (selection bias likely).
  • If neither is significant: Do not roll out.
In [29]:
# --- 6.1 Print A/B decision facts (assumes earlier variables exist) ---

try:
    print("A/B test (raw):   Z = %.3f, p = %.4g" % (z_raw, p_raw))
    print("A/B test (PSM):   Z = %.3f, p = %.4g" % (z_psm, p_psm))
    print("Raw rates:        A = %.3f, B = %.3f" % (conv_A_sum/len(df_A), conv_B_sum/len(df_B)))
    print("PSM rates:        A = %.3f, B = %.3f" % (mA['Conversion'].mean(), mB['Conversion'].mean()))
except Exception as e:
    print("A/B numbers not in scope. (This does not affect the budget section.)")
A/B test (raw):   Z = 8.117, p = 2.394e-16
A/B test (PSM):   Z = 7.129, p = 5.067e-13
Raw rates:        A = 0.856, B = 0.934
PSM rates:        A = 0.850, B = 0.942

6.2 Read the ROI vs M-ROI matrix¶

  • Average ROI answers: “On average, did the channel pay back?”
  • M-ROI (slope of Revenue(Spend)) answers: “If I add one more $, how many $ of revenue do I get right now?”
    • M-ROI > 1 ⇒ incremental dollars are profitable (revenue > spend).
    • M-ROI < 1 ⇒ over-investment at the margin (each extra $ loses money).
In [31]:
# --- 6.2 ROI vs M-ROI matrix (compare_df expected from Step 5) ---
if 'compare_df' in globals():
    display(compare_df[['CampaignChannel','AverageROI','AvgSpend','MROI_at_AvgSpend']])
    fig = px.scatter(
        compare_df,
        x="AverageROI", y="MROI_at_AvgSpend",
        color="CampaignChannel", size="AvgSpend", size_max=22,
        title="Average ROI vs M-ROI (zoomed)"
    )
    # zoom so points don’t overlap
    x_pad = 0.001
    xs = compare_df["AverageROI"].values
    fig.update_xaxes(range=[xs.min()-x_pad, xs.max()+x_pad], tickformat=".5f", title="Average ROI")
    ys = compare_df["MROI_at_AvgSpend"].values
    y_span = ys.max() - ys.min()
    pad = max(0.0006, y_span * 1.5)
    fig.update_yaxes(range=[ys.min()-pad, ys.max()+pad], tickformat=".4f", title="M-ROI @ avg spend")
    fig.update_layout(template="plotly_white", height=520)
    fig.add_hline(y=1, line_dash="dot", line_color="gray")
    fig.add_vline(x=1, line_dash="dot", line_color="gray")
    fig.show()
else:
    print("compare_df not found; please run Step 5 before Step 6.")
CampaignChannel AverageROI AvgSpend MROI_at_AvgSpend
0 Email -0.982786 5055.604272 0.001311
1 PPC -0.982181 4954.221741 0.001435
2 Referral -0.982458 5034.042284 0.001237
3 SEO -0.982444 4994.131533 0.001602
4 Social Media -0.982512 4965.321428 0.001682

6.3 Budget policy and why we changed the method¶

Important modeling note: In Step 5 we fit quadratics at the campaign level (TotalRevenue vs AdSpend per campaign). Using those same coefficients at channel-total scale (millions of dollars) violates scale consistency for nonlinear models, i.e. f(sum S_i) ≠ sum f(S_i). This can produce nonsense negative “lifts” when simulating large reallocations.

Default method (robust): Local, first-order reallocation¶

We optimize small rotations of budget using current M-ROIs only (no large extrapolation):

\begin{equation} \Delta \text{Revenue} \approx \big(\text{M-ROI}_\text{to} - \text{M-ROI}_\text{from}\big)\cdot \Delta \end{equation}

subject to:

  • Constant total budget (rotation cap e.g. 5–20%),
  • Learning floors (keep ≥5–10% of each channel’s current spend),
  • Steps in small increments (e.g., 1% of total) to stay local.

Outputs in this notebook

  • plan_local — list of concrete transfers (from → to, amount, estimated first-order gain).
  • alloc_df — new per-channel spends (constant total) and deltas vs current.
  • A first-order estimated lift for the chosen rotation cap.
In [128]:
cur_totals = df_campaign_agg.groupby('CampaignChannel')['AdSpend'].sum().rename('CurrentSpend')
mroi = compare_df.set_index('CampaignChannel')['MROI_at_AvgSpend'].copy()

# Config: how much budget are we allowed to shuffle (e.g., 10% of total)?
TOTAL = cur_totals.sum()
shift_cap = 0.10 * TOTAL         # allow at most 10% rotation
step = 0.01 * TOTAL              # move in 1% steps to keep it local
min_floor = 0.05                 # keep at least 5% of each channel's current spend

alloc = cur_totals.copy().astype(float)
moved = 0.0
records = []

while moved + 1e-9 < shift_cap:
    donor = mroi.idxmin()
    recv  = mroi.idxmax()
    if mroi[recv] <= mroi[donor] + 1e-12:   # no gain
        break

    max_out = alloc[donor] * (1 - min_floor)
    if max_out <= 0:
        # can't take from this donor; set its MROI high so we skip it
        mroi[donor] = np.inf
        continue

    delta = min(step, shift_cap - moved, max_out)
    if delta <= 0: break

    alloc[donor] -= delta
    alloc[recv]  += delta

    # first-order revenue gain
    gain = (mroi[recv] - mroi[donor]) * delta
    moved += delta
    records.append({
        'from': donor, 'to': recv, 'delta': delta, 'estimated_gain': gain,
        'MROI_from': mroi[donor], 'MROI_to': mroi[recv]
    })

plan_local = pd.DataFrame(records)
net_moves = (plan_local.groupby(['from','to'], as_index=False)
                           .agg(total_delta=('delta','sum'),
                                est_gain=('estimated_gain','sum'),
                                steps=('delta','count')))
net_moves = net_moves.sort_values('total_delta', ascending=False)
print("Net transfer summary (aggregated):")
display(net_moves)

print("\nNew allocated spends (keeping total constant):")
alloc_df = alloc.rename('ReallocatedSpend').reset_index().rename(columns={'index':'CampaignChannel'})
alloc_df = alloc_df.merge(cur_totals.rename('CurrentSpend'), on='CampaignChannel', how='left')
alloc_df['Delta_vs_Current'] = alloc_df['ReallocatedSpend'] - alloc_df['CurrentSpend']
display(alloc_df)

if len(plan_local):
    print(f"\nFirst-order estimated revenue lift for {shift_cap/TOTAL:.0%} rotation: "
          f"{plan_local['estimated_gain'].sum():,.0f}")
else:
    print("\nNo profitable first-order move given current M-ROIs.")
Net transfer summary (aggregated):
from to total_delta est_gain steps
0 Referral Social Media 4.000756e+06 1780.185773 10
New allocated spends (keeping total constant):
CampaignChannel ReallocatedSpend CurrentSpend Delta_vs_Current
0 Email 7.871576e+06 7.871576e+06 0.000000e+00
1 PPC 8.199237e+06 8.199237e+06 0.000000e+00
2 Referral 4.652763e+06 8.653519e+06 -4.000756e+06
3 SEO 7.740904e+06 7.740904e+06 0.000000e+00
4 Social Media 1.154308e+07 7.542323e+06 4.000756e+06
First-order estimated revenue lift for 10% rotation: 1,780

6.4 Guardrails & Handoff¶

We validate the plan with a minimum-spend floor per channel (learning/coverage), visualize budget deltas, and export the plan for sign-off. Sensitivity and CUPED are omitted for clarity.

In [130]:
assert 'alloc_df' in globals(), "Run Step 6.3 first to generate alloc_df."

# 1) Guardrail check: min floor per channel
MIN_FLOOR_RATIO = 0.05  # keep ≥5% of current spend
guard = alloc_df.copy()
guard["MinFloor"] = guard["CurrentSpend"] * MIN_FLOOR_RATIO
guard["FloorViolated"] = guard["ReallocatedSpend"] < guard["MinFloor"]

print("Guardrail check (Min spend floor):")
display(guard[["CampaignChannel","CurrentSpend","ReallocatedSpend","MinFloor","FloorViolated"]])

# 2) Budget deltas: clean bar chart
fig_delta = px.bar(
    alloc_df.sort_values('Delta_vs_Current'),
    x='CampaignChannel', y='Delta_vs_Current',
    title='Budget Changes by Channel (Δ vs Current)',
    labels={'Delta_vs_Current':'Δ Spend'},
)
fig_delta.update_layout(template='plotly_white', height=420)
fig_delta.add_hline(y=0, line_dash='dot', line_color='gray')
fig_delta.show()

# 3) Export handoff tables (for sign-off)
os.makedirs("outputs/tables", exist_ok=True)
alloc_df.to_csv("outputs/tables/budget_plan.csv", index=False)
if 'compare_df' in globals():
    compare_df.to_csv("outputs/tables/roi_mroi_matrix.csv", index=False)

print("Saved: outputs/tables/budget_plan.csv")
if 'compare_df' in globals():
    print("Saved: outputs/tables/roi_mroi_matrix.csv")
Guardrail check (Min spend floor):
CampaignChannel CurrentSpend ReallocatedSpend MinFloor FloorViolated
0 Email 7.871576e+06 7.871576e+06 393578.792594 False
1 PPC 8.199237e+06 8.199237e+06 409961.849056 False
2 Referral 8.653519e+06 4.652763e+06 432675.934280 False
3 SEO 7.740904e+06 7.740904e+06 387045.193783 False
4 Social Media 7.542323e+06 1.154308e+07 377116.162458 False
Saved: outputs/tables/budget_plan.csv
Saved: outputs/tables/roi_mroi_matrix.csv